{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9fd54a32",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/openai.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e3a8796-edc8-43f2-94ad-fe4fb20d70ed",
   "metadata": {},
   "source": [
    "# Baseten Cookbook"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f60c6b80",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index llama-index-llms-baseten"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e5761a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.baseten import Baseten"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b007403c-6b7a-420c-92f1-4171d05ed9bb",
   "metadata": {},
   "source": [
    "## Model APIs vs. Dedicated Deployments\n",
    "\n",
    "Baseten offers two main ways for inference.\n",
    "1. Model APIs are public endpoints for popular open source models (GPT-OSS, Kimi K2, DeepSeek etc) where you can directly use a frontier model via slug e.g.  `deepseek-ai/DeepSeek-V3-0324` and you will be charged on a per-token basis. You can find the list of supported models here: https://docs.baseten.co/development/model-apis/overview#supported-models.\n",
    "\n",
    "2. Dedicated deployments are useful for serving custom models where you want to autoscale production workloads and have fine-grain configuration. You need to deploy a model in your Baseten dashboard and provide the 8 character model id like `abcd1234`.\n",
    "\n",
    "By default, we set the `model_apis` parameter to `True`. If you want to use a dedicated deployment, you must set the `model_apis` parameter to `False` when instantiating the Baseten object."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf5ddd8b",
   "metadata": {},
   "source": [
    "#### Instantiation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "03faa599",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Model APIs, you can find the model_slug here: https://docs.baseten.co/development/model-apis/overview#supported-models\n",
    "llm = Baseten(\n",
    "    model_id=\"MODEL_SLUG\",\n",
    "    api_key=\"YOUR_API_KEY\",\n",
    "    model_apis=True,  # Default, so not strictly necessary\n",
    ")\n",
    "\n",
    "# Dedicated Deployments, you can find the model_id by in the Baseten dashboard here: https://app.baseten.co/overview\n",
    "llm = Baseten(\n",
    "    model_id=\"MODEL_ID\",\n",
    "    api_key=\"YOUR_API_KEY\",\n",
    "    model_apis=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a9a2e45",
   "metadata": {},
   "source": [
    "#### Call `complete` with a prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60be18ae-c957-4ac2-a58a-0652e18ee6d6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Paul Graham is a British-American entrepreneur, essayist, and programmer, best known for co-founding the startup accelerator **Y Combinator (YC)** and for his influential essays on technology, startups, and philosophy. Here are some key highlights about him:\n",
      "\n",
      "### **Background & Career**\n",
      "- Born in 1964 in England, Graham studied at **Cornell University** and earned a PhD in **Computer Science** from **Harvard**.\n",
      "- He created **Viaweb** (1995), the first web-based application, which was later acquired by Yahoo! in 1998 and became **Yahoo! Store**.\n",
      "- Co-founded **Y Combinator (2005)** with Jessica Livingston, Robert Morris, and Trevor Blackwell. YC has funded companies like **Airbnb, Dropbox, Stripe, Reddit, and DoorDash**.\n",
      "\n",
      "### **Writing & Influence**\n",
      "- Known for his **essays** on startups, technology, and life philosophy (hosted on his website [paulgraham.com](http://www.paulgraham.com)).\n",
      "- Popular essays include:\n",
      "  - *\"How to Start a Startup\"*  \n",
      "  - *\"Do Things That Don't Scale\"*  \n",
      "  - *\"The Hardest Lessons for Start\n"
     ]
    }
   ],
   "source": [
    "llm_response = llm.complete(\"Paul Graham is\")\n",
    "print(llm_response.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14831268-f90f-499d-9d86-925dbc88292b",
   "metadata": {},
   "source": [
    "#### Call `chat` with a list of messages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bbe29574-4af1-48d5-9739-f60652b6ce6c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.llms import ChatMessage\n",
    "\n",
    "messages = [\n",
    "    ChatMessage(\n",
    "        role=\"system\", content=\"You are a pirate with a colorful personality\"\n",
    "    ),\n",
    "    ChatMessage(role=\"user\", content=\"What is your name\"),\n",
    "]\n",
    "resp = llm.chat(messages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9cbd550a-0264-4a11-9b2c-a08d8723a5ae",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "assistant: Arrr, matey! I be known as Captain Crimsonbeard—though me beard be more fiery red than crimson, truth be told! A pirate of legend, scourge of the seven memes, and connoisseur of questionable life choices. But ye can call me Cap’n if ye like, or \"That Weird Pirate Who Won’t Stop Talking About Pineapples.\" Now, what mischief brings ye to me ship today? 🏴‍☠️🍍\n"
     ]
    }
   ],
   "source": [
    "print(resp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ed5e894-4597-4911-a623-591560f72b82",
   "metadata": {},
   "source": [
    "## Streaming"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4cb7986f-aaed-42e2-abdd-f274f6d4fc59",
   "metadata": {},
   "source": [
    "Using `stream_complete` endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d43f17a2-0aeb-464b-a7a7-732ba5e8ef24",
   "metadata": {},
   "outputs": [],
   "source": [
    "resp = llm.stream_complete(\"Paul Graham is \")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0214e911-cf0d-489c-bc48-9bb1d8bf65d8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Paul Graham is a British-American entrepreneur, essayist, and venture capitalist, best known as a co-founder of **Y Combinator**, a highly influential startup accelerator that has helped launch companies like Airbnb, Dropbox, Stripe, and Reddit.  \n",
      "\n",
      "### Key Facts About Paul Graham:  \n",
      "1. **Early Career**: Originally a programmer, he developed **Viaweb**, one of the first web-based applications, which was acquired by Yahoo! in 1998 and became Yahoo! Store.  \n",
      "2. **Y Combinator**: In 2005, he co-founded Y Combinator with Jessica Livingston, Robert Morris, and Trevor Blackwell. It pioneered the \"seed accelerator\" model, providing funding and mentorship to early-stage startups.  \n",
      "3. **Essays**: Graham is known for his insightful essays on startups, technology, and life philosophy, available on his website ([paulgraham.com](http://www.paulgraham.com)). Popular ones include *\"How to Get Startup Ideas\"* and *\"Do Things That Don't Scale.\"*  \n",
      "4. **Investments**: Through YC, he has backed thousands of startups, shaping Silicon Valley’s tech landscape.  \n",
      "5. **Lisp Advocate**: A proponent of the Lisp programming language,"
     ]
    }
   ],
   "source": [
    "for r in resp:\n",
    "    print(r.delta, end=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40350dd8-3f50-4a2f-8545-5723942039bb",
   "metadata": {},
   "source": [
    "Using `stream_chat` endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bc636e65-a67b-4dcd-ac60-b25abc9d8dbd",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.llms import ChatMessage\n",
    "\n",
    "messages = [\n",
    "    ChatMessage(\n",
    "        role=\"system\", content=\"You are a pirate with a colorful personality\"\n",
    "    ),\n",
    "    ChatMessage(role=\"user\", content=\"What is your name\"),\n",
    "]\n",
    "resp = llm.stream_chat(messages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4475a6bc-1051-4287-abce-ba83324aeb9e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Arrr, me name be Captain Crimsonbeard! A fearsome and flamboyant pirate with a beard as red as the setting sun and a wardrobe brighter than a treasure chest full o’ jewels! I sail the seven seas in search of adventure, gold, and the finest rum—always with a dramatic flair and a twinkle in me eye. \n",
      "\n",
      "What be yer name, matey? Or shall I just call ye \"Lucky Crewmember\" for now? *winks and adjusts my feathered hat*"
     ]
    }
   ],
   "source": [
    "for r in resp:\n",
    "    print(r.delta, end=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0248c57",
   "metadata": {},
   "source": [
    "# Async\n",
    "Async operations are used for long-running inference tasks that may hit request timeouts, batch inference jobs, and prioritizing certain requests.\n",
    "\n",
    "(1) In the integation, `acomplete` async function is implemented using the aiohttp library, an asynchronous HTTP client in python. The function invokes the async_predict at the approriate Baseten model endpoint, then the user receives a response with the request_id if successful. The user can then check the status or cancel the async_predict request using the returned request_id.\n",
    "\n",
    "(2) Once the model finishes executing the request, the async result will be posted to the user provided webhook endpoint. The user's endpoint is responsible for validating the webhook signature for security, then processing and storing the output.\n",
    "\n",
    "Baseten: Get request_id → result is posted to webhook\n",
    "\n",
    "##### Note: Async is only available for dedicated deployments and not for model APIs. `achat` is not supported because chat does not make sense for async operations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4d5e2e01",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "35643965636d4c3da6f54b5c3b354aa0\n"
     ]
    }
   ],
   "source": [
    "async_llm = Baseten(\n",
    "    model_id=\"YOUR_MODEL_ID\",\n",
    "    api_key=\"YOUR_API_KEY\",\n",
    "    webhook_endpoint=\"YOUR_WEBHOOK_ENDPOINT\",\n",
    ")\n",
    "response = await async_llm.acomplete(\"Paul Graham is\")\n",
    "print(response)  # This is the request id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb54a4d0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'request_id': '35643965636d4c3da6f54b5c3b354aa0', 'model_id': 'yqvr2lxw', 'deployment_id': '31kmg1w', 'status': 'SUCCEEDED', 'webhook_status': 'SUCCEEDED', 'created_at': '2025-03-27T00:17:51.578558Z', 'status_at': '2025-03-27T00:18:38.768572Z', 'errors': []}\n"
     ]
    }
   ],
   "source": [
    "\"\"\"\n",
    "This will return the status information of a request using an async_predict request's request_id and the model_id the async_predict request was made with.\n",
    "\"\"\"\n",
    "\n",
    "import requests\n",
    "import os\n",
    "\n",
    "model_id = \"YOUR_MODEL_ID\"\n",
    "request_id = \"YOUR_REQUEST_ID\"\n",
    "# Read secrets from environment variables\n",
    "baseten_api_key = \"YOUR_API_KEY\"\n",
    "\n",
    "resp = requests.get(\n",
    "    f\"https://model-{model_id}.api.baseten.co/async_request/{request_id}\",\n",
    "    headers={\"Authorization\": f\"Api-Key {baseten_api_key}\"},\n",
    ")\n",
    "\n",
    "print(resp.json())"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
